AITopics | decay 0

Collaborating Authors

decay 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Masked Image Modeling Supplementary Material Anonymous Author(s) Affiliation Address email 1 More Training Details 1

Neural Information Processing SystemsFeb-12-2026, 09:42:14 GMT

We use the same setting for different sizes RevCol models on MIM pre-training. The hyper-parameters generally follow [4, 2]. Table 3 shows the detail training settings after MIM pre-training. We also show training settings on ImageNet-1K after ImageNet-22K fine-tuning. For semantic segmentation, we evaluate different backbones on ADE20K dataset.

artificial intelligence, fine-tuning, machine learning, (11 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Holland > Amsterdam (0.05)
Asia > Middle East > Israel (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

So from this viewpoint, the Schedule-Free updates can be seen as a version of momentum that has the same immediate effect, but with a greater delay foradding intheremainder ofthegradient.

artificial intelligence, justification, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Evolution Strategies at the Hyperscale

Sarkar, Bidipta, Fellows, Mattie, Duque, Juan Agustin, Letcher, Alistair, Villares, Antonio León, Sims, Anya, Cope, Dylan, Liesen, Jarek, Seier, Lukas, Wolf, Theo, Berdica, Uljad, Goldie, Alexander David, Courville, Aaron, Sevegnani, Karin, Whiteson, Shimon, Foerster, Jakob Nicolaus

arXiv.org Artificial IntelligenceNov-21-2025

We introduce Evolution Guided General Optimization via Low-rank Learning (EGGROLL), an evolution strategies (ES) algorithm designed to scale backprop-free optimization to large population sizes for modern large neural network architectures with billions of parameters. ES is a set of powerful blackbox optimisation methods that can handle non-differentiable or noisy objectives with excellent scaling potential through parallelisation. Na{ï}ve ES becomes prohibitively expensive at scale due to the computational and memory costs associated with generating matrix perturbations $E\in\mathbb{R}^{m\times n}$ and the batched matrix multiplications needed to compute per-member forward passes. EGGROLL overcomes these bottlenecks by generating random matrices $A\in \mathbb{R}^{m\times r},\ B\in \mathbb{R}^{n\times r}$ with $r\ll \min(m,n)$ to form a low-rank matrix perturbation $A B^\top$ that are used in place of the full-rank perturbation $E$. As the overall update is an average across a population of $N$ workers, this still results in a high-rank update but with significant memory and computation savings, reducing the auxiliary storage from $mn$ to $r(m+n)$ per layer and the cost of a forward pass from $\mathcal{O}(mn)$ to $\mathcal{O}(r(m+n))$ when compared to full-rank ES. A theoretical analysis reveals our low-rank update converges to the full-rank update at a fast $\mathcal{O}\left(\frac{1}{r}\right)$ rate. Our experiments show that (1) EGGROLL does not compromise the performance of ES in tabula-rasa RL settings, despite being faster, (2) it is competitive with GRPO as a technique for improving LLM reasoning, and (3) EGGROLL enables stable pre-training of nonlinear recurrent language models that operate purely in integer datatypes.

artificial intelligence, evolutionary algorithm, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2511.16652

Country:

North America > United States (0.67)
North America > Canada (0.45)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

The Road Less Scheduled Aaron Defazio 1 Fundamental AI Research Team, Meta Xingyu (Alice) Y ang 2

Neural Information Processing SystemsOct-9-2025, 18:59:32 GMT

Recently, Zamani and Glineur (2023) and Defazio et al. (2023) showed that the exact worst-case Our approach uses an alternative form of momentum that replaces traditional momentum. So from this viewpoint, the Schedule-Free updates can be seen as a version of momentum that has the same immediate effect, but with a greater delay for adding in the remainder of the gradient.

algorithm, optimization, sequence, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Appendix To ensure that our results can be replicated we provide further experimental details and complementary

Neural Information Processing SystemsAug-14-2025, 22:18:33 GMT

Symbol Definition x 2X,u 2U,y 2Y Input, model output, and desired output p ( x, y) Joint probability distribution of input x and target or label y f: X!U, 2 Model function with parameters in parameter space L ( p, f):( p, f) 7! R

decay 0, rate decay 0, weight decay 0, (10 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback